# INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & MANAGEMENT 16\*16 BIT LOW POWER HIGH SPEED FIXED POINT MULTIPLIER

Jasbir Kaur\* & Dr. Neelam Rup Prakash

#### ABSTRACT

Multiplier is the execution unit in ALU and DSP. As multiplication dominates the execution time of most DSP algorithms so the high speed multiplier is very desirable. With an ever increasing the quest for greater computing power on batter operated mobile devices design emphasis has shifted from optimizing conventional delay time and area size to minimize power dissipation while maintaining the high performance. In this paper a high speed and low power multiplier has been designed using the cadence virtuoso tool.

Keywords: Array Multiplier, Transmission AND gate, RCA, low power.

### I. INTRODUCTION

Multiplication is an important fundamental function in arithmetic operation. In some of the frequently used computation intensive arithmetic functions implemented in many DSP applications such as convolution, FFT, filtering and others. Currently multiplier time is still the dominant factor in determining the instruction cycle time of DSP chip. Simply multiplication is considered as a series of repeated addition. The number to be added is called the multiplicand and the number of times it is added is called the multiplier and the result obtained is called the product. The basic operations involved in multiplication are generating and accumulating or adding the partial products. To increase the speed of the multiplication process these two steps must be optimized.

Various architectures such as Array, Wallace tree etc has been used to implement the multiplier .Generation of the partial products is the first step in multiplication and AND gates are used for this purpose. The next step is the addition of these partial products and adders are used for this purpose. Adders and AND gates are the units of any multipler. Fast adder circuits increases the speed of the multiplier . The selection of multiplier architecture also increases the speed. AND gate is also used in designing the adder circuit. So here the component level AND gate and Adder circuits has been design and

#### **Related Work**

To design the multiplier as discussed above the AND gate to generate the partial products and adder circuit to add the partial products has been designed . AND gate can be designed using the conventional CMOS or using the transmission gate . Transmission gate is an electronic switch that can block or pass the signal level from the input to the output.it has three inputs called source, n- gate and p- gate and an output a drain. Transmission gate is simply the combination of two complimentary CMOSs used in high speed performances. In this work the AND gate has been designed using the transmission gate .

Adder circuits add the bits gives the result as carry and sum .The fast adders of 28 CMOSinstead of the conventional adder in which 32 CMOSs are used has been designed . To obtain the product the adder and the AND gate plays an important role on the performance parameters of multiplier.

Array multiplier: Due to its regular structure Array multiplier architecture is the known architecture. The addition of the generated partial products is done serially as well as parallel. Total number of logic units in  $n^*$  m bit Array multiplier is  $n^*m$  two input AND gates and (m-1) units on n bit adders. Delay due to AND in partial products at all level is one unit AND gate delay but delay at levels of 1 to (m-1) level units of n bit adder is equal to (m-1) \* delay of one unit n bit adder. The delay is logarithmically proportional to the bit size of multiplicand and multiplier if we use the high speed array multiplier circuit.

# II. METHOD

In this paper 16\*16 bit array multiplier has been designed at a component level. Firstly the AND gate has been designed at component level using the conventional CMOS and transmission gate CMOS.Further the various adder circuits has been designed and finally the array architecture has been designed.the various adder circuits designed are the 32 CMOS adder circuit, 28 CMOS adder, 20 CMOS adder and 10 CMOS adder circuit.

As at component level the 16 \* 16 bit multiplier is complex to built so the 8\*8 bit multiplier is built initially using the 4\*4 bit multiplier unit. The following method explained in fig has been used for simplicity. This method reduces the complexity of the circuit and easy to built the 16 \* 16 bit hybrid multiplier .

| A <sub>15-8</sub> A <sub>7-0</sub> | B 15-8 B 7-0                 |        |            |
|------------------------------------|------------------------------|--------|------------|
| _                                  | $A_{7-0} * B_{7-0}$          |        | PP0<br>PP1 |
|                                    | A 15-8* B 7-0<br>7-0* B 15-8 |        | PP2        |
| A <sub>15-8</sub> *                | B 15-8                       |        | PP3        |
| P31- 24                            | P 23-16                      | P 15-8 | Р 7-0      |

In this the adder circuit used is the 28 CMOS adder as the result shows that this circuit is better for high speed and low power .This adder circuit performs better for 1 bit full adder and also for the 4 bit RCA. The 4 bit RCA has been designed so as the carry bit ripples through the chain of the cascade full adders from a lower bit to the next higher order full adder .Of all the adder architecture the RCA occupies the smallest area and offer good performance for random input data. The area of the adder is proportional to n. The worst case delay increases linearly with the length of the propagation path which depends on the number of bits processed by the operand n.

AND gate circuit used for generating the partial product has been designed using the transmission gate as the delay and power is less than the conventional AND gate. Logic circuits can be constructed with the transmission gate technology instead of CMOS made more compact which is an important consideration for silicon implementation. The CMOS transmission gate consists of two MOSFETs, one n-channel responsible for correct transmission of logic zeros, and one p-channel, responsible for correct transmission of logic ones. AND gates designed using transmission gate consumes less power as compared to the AND gate using the conventional CMOS. The figure1 shows the AND gate circuit using the transmission gate .Delay and power parameter are  $3.76 \times 10^{-12}$  and  $11.86 \times 10^{-6}$  respectively as compare to conventional CMOS i.e  $233.4 \times 10^{-12}$  and  $16.35 \times 10^{-6}$ .



Figure 1: AND gate using transmission gate

|          | Proposed                                       |                        | Conventional                                   |                       | Power delay product<br>(Watt sec*10 <sup>-</sup> 15) |                  |
|----------|------------------------------------------------|------------------------|------------------------------------------------|-----------------------|------------------------------------------------------|------------------|
|          | Power<br>consumption<br>*10 <sup>-6</sup> Watt | Delay<br>sec           | Power<br>consumption<br>*10 <sup>-6</sup> Watt | Delay<br>sec          | Proposed                                             | Conventiona<br>1 |
| AND gate | 11.86                                          | 3.76*10 <sup>-12</sup> | 16.35                                          | 233.4*10 <sup>-</sup> | .044593                                              | 3.81609          |

Table 1: Comparison of proposed AND gate

The table 1 shows the result of the proposed and the conventional AND gate and the delay of the proposed AND gate is 11.86 µw as compared to the conventional CMOS having 16.35 µw and the delay parameter is 3.76 ps as compared to the conventional of 233.4 ps.

The fast 1 bit full adder circuits has been designed using the 32 CMOS, 28 CMOS, 20 CMOS and 10 CMOS configurations which are widely accepted and utilized in numerous applications. To ensure better speed performance a fast full adder has been designed in its logical realization as wherever two or more of these full adders are cascaded together to perform multiple bit addition it often exhibits a critical delay that actually limits the systems total performance. To ensure the low power and high speed the PDP is calculated and the PDP of 28 CMOS is better then the other CMOS adder configurations . further the cascading is done in RCA using all the four adder configurations and the 4 bit, 8 bit and 16 bit RCA has been designed and the result shows again the 28 CMOS adder configuration better than 32 CMOS, 20 CMOS and 10 CMOS in 4 bit RCA, 8 bit RCA and 16 bit RCA.

| Type of the circuit |         | Power<br>consumption ( µw) | Delay(ps) | PDP<br>femto watt sec |
|---------------------|---------|----------------------------|-----------|-----------------------|
| 1 bit full adder    | 32 CMOS | 25.48                      | 278.1     | 7.085988              |
|                     | 28 CMOS | 10.6                       | 58.28     | .617768               |
|                     | 20 CMOS | 56.44                      | 30460     | 1719.1624             |
|                     | 10 CMOS | 183100                     | 5085      | 931063.500            |

The table 2 shows the comparison of the 32 CMOS, 28 CMOS, 20 CMOS and 10 CMOS designed adder configuration . Here the result obtained implies that the 28 CMOS adder configuration is having the delay of 58.28 ps and the power consumption of 10.6 µw out performs the 32, 28, 20 and 10 CMOS adder configuration.



Figure 2: Ciruit diagram of 8 bit RCA

The figure 2 shows the circuit diagram of 8 bit RCA where as the figure 3 shows the internal diagram of the 8 bit RCA and the figure shows the output waveform of the 8 bit RCA.

# [Kaur, 7(3): July-September 2017]

### ISSN 2277 - 5528 Impact Factor- 4.015



Figure 3 : Internal circuit diagram of 8 bit RCA

The table 3 shows the result of 4 bit RCA, 8 bit and 16 bit RCA designed . The PDP of 28 CMOS for the 4bit RCA, 8 bit RCA and 16 bit RCA is lower then the 32, 28, 20 and 10 CMOS circuit. So the 28 CMOS out performs in all the cases .

| Table 3:Comparison of 4 bit, 8 bit, 16 bit RCA |               |               |           |                |  |
|------------------------------------------------|---------------|---------------|-----------|----------------|--|
| Type of the                                    | Circuit using | Power         | Delay(ps) | PDP            |  |
| circuit                                        | no. of CMOSs  | consumption ( |           | femto watt sec |  |
|                                                |               | μw)           |           |                |  |
| 4 bit RCA                                      | 32 CMOS       | 132.6         | 50840     | 6741.384       |  |
|                                                | 28 CMOS       | 119.8         | 549.3     | 65.806140      |  |
|                                                | 20 CMOS       | 53.01         | 50810     | 2693.438       |  |
|                                                |               |               |           |                |  |
|                                                | 10 CMOS       | 15.32         | 50800     | 778.256        |  |
| 8 bit RCA                                      | 32 CMOS       | 252.6         | 50840     | 12842.184      |  |
|                                                | 28 CMOS       | 229.9         | 550.3     | 126.51397      |  |
|                                                | 20 CMOS       | 169.4         | 50810     | 8607.214       |  |
|                                                |               |               |           |                |  |
|                                                | 10 CMOS       | 869.6         | 50810     | 44184.376      |  |
| 16 bit RCA                                     | 32 CMOS       | 513.7         | 50900     | 26147.330      |  |
|                                                | 28 CMOS       | 440.2         | 553.2     | 243.518640     |  |
|                                                | 20 CMOS       | 365.1         | 50830     | 18558.033      |  |
|                                                |               |               |           |                |  |
|                                                | 10 CMOS       | 2289          | 150700    | 34495.230      |  |

# [Kaur, 7(3): July-September 2017]

# ISSN 2277 - 5528 Impact Factor- 4.015

Simulation Results : The 16 \*16 bit array multiplier has been designed using the cadence virtuoso tool using the 8\*8 bit multiplier where as the 8\*8 multiplier has been designed using the 4\*4 bit multiplier for simplicity. The AND gate used in this multiplier to generate the partial products has been designed using the transmission gate and the 1 bit full adder configuration is 28 CMOS configuration has been designed to add the partial products.



Figure 5 : Internal circuit diagram of 16 \* 16 bit array multiplier

The internal circuit diagram of 16 \*16 bit array multiplier is shown in figure 5. The circuit is designed using the sub blocks as discussed previously. The performance of the circuit is verified by obtaining the output waveform. The output waveform obtained is shown in figure 7.



Fgure 6 : Circuit diagram of 16 \* 16 bit array multiplier



The table 4 shows the result for the designed circuits where as the output wave form confirms the correctness of the circuit for the 16\*16 bit array multiplier for the 21\*10 where as the other combinations are also checked and verified. The output shown in the figure 7 is from  $y_0$  to  $y_{20}$  where as the outputs for Y  $_{21}$  to Y  $_{31}$  are zero only. The power and the delay parameter are also labeled in the output wave.

| Type of the circuit | Circuit using no. | Power consumption ( | Delay(ps)              | PDP            |
|---------------------|-------------------|---------------------|------------------------|----------------|
|                     | of CMOSs          | μw)                 |                        | femto watt sec |
| AND gate            |                   | 11.86               | 3.76*10 <sup>-12</sup> | .044593        |
| 1 bit full adder    | 28 CMOS           | 10.6                | 58.28                  | .617768        |
| 4 bit RCA           | 28 CMOS           | 119.8               | 549.3                  | 6741.384       |
| 8 bit RCA           | 28 CMOS           | 229.9               | 550.3                  | 12842.184      |
|                     |                   |                     |                        |                |
| 16 bit RCA          | 28 CMOS           | 440.2               | 553.2                  | 26147.330      |
| 16*16 bit Array     |                   | 428.6               | 1255                   |                |
| multiplier          |                   |                     |                        |                |

#### REFERENCES

- [1] C.F.Law,S.S.Rofail and K.S.Yeo "Low Power Circuit Implementation for Partial Product Addition Using Pass Transistor Logic," IEEE Proceedings-Circuits Devices Systems, Vol 146,No.3,June 1999.
- [2] N.H.E. Weste and K.Eshraghian, Principles of CMOS VLSI design: A System Prespective, Reading Massachusetts: Addison Wesely, 1993
- [3] K. Prasad and K. K. Parhi, "Low-power 4-2 and 5-2 compressors," in Proc. of the 35th Asilomar Conf. on Signals, Systems and Computers, 2001, Vol. 1, pp. 129–133.

#### *[Kaur, 7*(3): July-September 2017]

- [4] Perneti Balasreekanth Reddy and V. S. Kanchana Bhaskaran, Design of Adiabatic Tree Adder Structures for Low Power, International Conference on Embedded Systems (ICES 2010) organized by CIT, Coimbatore and Oklohoma State University, 14-16 July 2010
- [5] Kiat-Seng yeo and Kaushik Roy "Low- Voltage ,Low Power VLSI Subsystems" Tata McGraw-Hill ,edition 2009
- [6] C.R Baugh and B.A wooley "A Two's Complement Parallel Array Multiplication Algorithm," IEEE Transactionson Computers, Vol 22, No.12, December 1973.
- [7] A.M. Shams and M.A Bayoumi "A new full adder cell for low power specifications." in Proc IEEE 8th Great Lakes Symp. VLSI. Feb 1998 pp 45-49
- [8] C. Nagendra, M.J Irwin and R. M Owens, "Area –time-power trade offs in parallel adders." IEEE Trans Circuits and Systems –II Analog and Digital Signal Processing. Vol. 43, No. 10 pp 689-702 Oct 1996
- [9] H.A. Mahmoud and M.A. Bayoumi, "A 10 CMOS low power high speed full adder cell," IEEE Int. Symp. Circuits and Systems, May June 1999, pp 43-46
- [10]A. Bellaouar and M.I. Elmasry, Low power digital VLSI design :Circuits and systems, The Netherlands : Kluwer Academic Publishers, 1995
- [11]I.Koren Computer Arithmetic Algorithms, Englewood Cliffs, New Jersy, Prentice Hall, 1993
- [12].W Allam and M.I. Elmsary, Low power implementation of fast addition algorithms," IEEE Canadian Conf. Electrical and Computer Engineering, May 1998, pp 645-647